Suppose we have several realisations of a pitch accent in two conditions. For simplicity:
We could try something like this and use lm():
\[ y_i = \beta_0 + \beta_{0, P} + \beta_1 t_i + \beta_{1, P} t_i + \beta_2 t_i^2 + \beta_{2, P} t_i^2 + \beta_3 t_i^3 + \beta_{3, P} t_i^3 + ... + \epsilon_i \] where:
In R it would look like this:
mod <- lm(y ~ Category + time + Category:time + I(time^2) + Category:I(time^2) + I(time^3) + Category:I(time^3) , data = curves)
although this is quite a naive approach, see orthogonal polynomials (poly in R).
The model looks like:
\[ y_i = f(t_i) + f_P(t_i) + \epsilon_i \] where each \(f()\) is a SPLINE: \[ f(t) = \sum_{k=1}^K \beta_k \cdot \phi_k(t) \] In the GAMMs lingo, the \(f()\) are called smooths. The functions \(\phi_k(t)\) are the spline components, they are known, they play the same role as the powers of \(t\) in the previous equation. In simple terms, we can imagina a smooth to be like a polynomial of unspecified degree, or in general a functional term of unspecified complexity, i.e. we do not say how many terms it should include. The GAMM machinary takes care of determining how many terms are strictly needed for our problem.
mgcv is the most popular for GAMMs within Rbam(), equivalent to lmer() in lme4formula syntax is quite different from the one used in lme4bam() has more possibilities than lmer()
lmer()) or as smooths, or combinationsa * b) is not available with smooths
interaction terms manually beforehand (more on this later)| Equation | R formula |
|---|---|
| \(y_i = \beta_0 + f(t_i) + \epsilon_i\) | y ~ s(time) |
Note that regular smooths like s(time) are centered on zero, hence the intercept term in the equation. The model summary() will show two separate terms:
(Intercept), i.e. \(\beta_0\)s(time), i.e. the zero-centered function \(f(t_i)\)Example:
m1 <- bam(y ~ s(time), data = curves)
summary(m1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.180243 0.002045 88.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 8.062 8.765 626.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.355 Deviance explained = 35.6%
## fREML = -1675.7 Scale est. = 0.041614 n = 9950
Parametric coefficients: the lm-style fixed effectsedf: estimated degrees of freedom of smooth terms. More or less, how many coefficients you need to specify the shape of a smooth. The default max number of coefficients is k = 10, minus 1, as the smooth is zero-centered, so the intercept is already specifiedfREML: fast restricted maximum likelihood estimation. Cannot be used to compare models with different fixed effects composition. For this you need to re-run your models in ML modality, see section 4.5.1 in Wieling’s tutorial.Scale est.: residual variancen: number of samples
Partial effects are smooths, which are zero centered in their default mode. This means that they represent the ‘shape part’ of the effect, i.e. the total effect minus the mean across the time axis (or whatever continuous variable you are using)
Smooths are (implicitly) numbered by their order of appearance in the summary() output. To plot a selected smooth, two options (among many):
First option: use mgcv::plot:
plot(m1, select = 1, shade = TRUE, rug = FALSE)
abline(h=0, col='red', lty=2)
Second option: extract fitting data from model, then plot it with ggplot or plotfunctions::plot_error.
smooth1 <- get_modelterm(m1, select = 1)
## Summary:
## * time : numeric predictor; with 30 values ranging from 0.000000 to 2.000000.
smooth1$time
## [1] 0.00000000 0.06896552 0.13793103 0.20689655 0.27586207 0.34482759
## [7] 0.41379310 0.48275862 0.55172414 0.62068966 0.68965517 0.75862069
## [13] 0.82758621 0.89655172 0.96551724 1.03448276 1.10344828 1.17241379
## [19] 1.24137931 1.31034483 1.37931034 1.44827586 1.51724138 1.58620690
## [25] 1.65517241 1.72413793 1.79310345 1.86206897 1.93103448 2.00000000
smooth1$fit
## [1] -0.00342467 0.04510567 0.09224509 0.13544434 0.17184146 0.19909197
## [7] 0.21585117 0.22182487 0.21744034 0.20336866 0.18022376 0.14866481
## [13] 0.10985977 0.06598440 0.02033317 -0.02319697 -0.06111334 -0.09135388
## [19] -0.11379613 -0.12997214 -0.14214610 -0.15224884 -0.16120094 -0.16888757
## [25] -0.17466124 -0.17798911 -0.17886685 -0.17784016 -0.17572653 -0.17324569
smooth1$se.fit
## [1] 0.02176571 0.01292935 0.01056546 0.01149424 0.01076584 0.01025022
## [7] 0.01089919 0.01077598 0.01033162 0.01072140 0.01081592 0.01038501
## [13] 0.01058404 0.01085199 0.01048968 0.01049534 0.01086372 0.01059403
## [19] 0.01038956 0.01081553 0.01071570 0.01031984 0.01076316 0.01089001
## [25] 0.01024194 0.01075161 0.01147593 0.01056255 0.01297303 0.02183860
ggplot(smooth1) +
aes(x = time, y = fit) +
geom_line() +
geom_ribbon(mapping = aes(ymin = fit - se.fit, ymax = fit + se.fit), fill = "red", alpha = 0.5) +
theme_light() +
theme(text = element_text(size = 15))
Summed effects include smooths and intercept (parametric) terms that pertain to an effect. In our case it means \(\beta_0 + f(t_i)\).
The easy way to plot summed effects is to use itsadug::plot_smooth:
plot_smooth(m1, view = "time", rug = FALSE)
## Summary:
## * time : numeric predictor; with 30 values ranging from 0.000000 to 2.000000.
## * NOTE : No random effects in the model to cancel.
##
One could also get all the relevant values and sum them manually. In our case, get samples from the smooth \(f(t_i)\) using get_modelterm as above, then add the value of \(\beta_0\) taken form the summary (i.e. m1$coefficients["(Intercept)"]).
Question: the confidence intervals are quite thin in comparison with the impression of noisy signals from the raw data plot. Why is that?
The specification of smooths s() has an extra argument k whose default value is 10. k indicates the maximum number of splines allowed to specify the smooth, or in other words the maximum number of parameters or degrees of freedom. As regular smooths are zero-centered, this means that a smooth with at most k = 10 parameters has actually at most k' = 9 degrees of freedom, as one is absorbed by the intercept term. The estimated degrees of freedom edf is an indicator for how many ‘equivalent parameters’ were actually utilised in the estimated smooth. When edf is near 1, it means that it’s basically a straight line (only the slope parameter), in which case we would rather remove the smooth and use an ordinary linear term. On the contrary, when edf is very close to k' = k-1 as in this case, it means that it has used up all its degrees of freedom, thus perhaps more might be needed.
Let’s specify a higher k and see what happens to edf for s(time):
m1.k20 <- bam(y ~ s(time, k = 20), data = curves)
summary(m1.k20)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.180243 0.002045 88.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 10.35 12.63 434.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.356 Deviance explained = 35.6%
## fREML = -1676 Scale est. = 0.041601 n = 9950
edf has not increased much, and it’s well below k - 1 = 19, so the limit on k = 10 was ok.
gam.check(m1)
##
## Method: fREML Optimizer: perf newton
## full convergence after 10 iterations.
## Gradient range [-8.367351e-11,8.267342e-11]
## (score -1675.696 & scale 0.04161424).
## Hessian positive definite, eigenvalue range [3.754601,4974.003].
## Model rank = 10 / 10
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 9.00 8.06 1 0.46
y in the example), and not the residual errors.| Equation | R formula |
|---|---|
| \(y_i = \beta_0 + \beta_P + f_{NP}(t_i) + f_P(t_i) + \epsilon_i\) | y ~ Category + s(time, by = Category) |
where \(P\) indicates the PEAK category, while the default (control) level is NO_PEAK (\(NP\)), and \(i\) is the sample (not curve) index. The equation is omitting the indicator functions. The complete version is: \[ y_i = \beta_0 + \beta_P \cdot I(i \in P) + f_{NP}(t_i) \cdot I(i \in NP) + f_P(t_i) \cdot I(i \in P) + \epsilon_i \] Note the asymmetry in the conventions for the parametric and the smooth parts of the model:
lm(), \(\beta_0\) is the intercept, \(\beta_P\) is the difference between level \(P\) and the interceptQuestion: What are the summed effects for the PEAK and the NO_PEAK levels (in maths notation)?
Estimate the model with bam():
curves %<>% mutate(Category = factor(Category)) # don't forget!
m2 <- bam(y ~ Category + s(time, by = Category), data = curves)
summary(m2)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ Category + s(time, by = Category)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1796755 0.0002261 794.78 <2e-16 ***
## CategoryPEAK 0.0242948 0.0003197 75.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time):CategoryNO_PEAK 8.980 9 48196 <2e-16 ***
## s(time):CategoryPEAK 8.998 9 37123 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.975 Deviance explained = 97.5%
## fREML = -47131 Scale est. = 0.00050838 n = 19900
Warning: all variables that need to be used as fixed or random factors (here Category) must be converted to R class factor, otherwise bam will throw an error.
Question: is PEAK significantly different from NO_PEAK?
We can read off the summary output the statistical significance of parametric terms in the same way as with lm. In geometrical terms, we can say whether the average level of y along the time axis is different for PEAK and NO_PEAK curves.
With the model expressed as above, we cannot use the significance levels in the summary to answer questions about the difference between smooths. A small p-value only says that the smooth is significantly different from zero, which means that its shape is different from the flat line y = 0. If both PEAK and NO_PEAK smooths appear with a small p-value, it means that their shapes are not flat, i.e. in both cases y varies non-linearly with time, but nothing can be said about whether their respective shapes are similar or not.
Let’s plot first. A look at the summed effects:
plot_smooth(m2, view = "time", plot_all = "Category", col = Category.colors)
## Summary:
## * Category : factor; set to the value(s): NO_PEAK, PEAK.
## * time : numeric predictor; with 30 values ranging from 0.000000 to 2.000000.
## * NOTE : No random effects in the model to cancel.
##
A graphical way to detect statistical differences between effects is:
plot_diff(m2, view = "time", comp = list(Category = c("PEAK", "NO_PEAK")))
## Summary:
## * time : numeric predictor; with 100 values ranging from 0.000000 to 2.000000.
## * NOTE : No random effects in the model to cancel.
##
##
## time window(s) of significant difference(s):
## 0.585859 - 0.747475
## 0.828283 - 0.989899
## 1.050505 - 1.191919
## 1.232323 - 1.919192
## 1.959596 - 2.000000
emmeansThe powerful emmeans function can be used on mgcv GAMMs as a way to test pointwise comparisons. Suppose we want to test whether PEAK and NO_PEAK differ at time equal 0.0, 1.0 and 2.0. We can use emmeans in the same way as we do with lm or lmer, and we will get the correction for multiple comparisons.
emmeans(m2, pairwise ~ Category | time, at = list(time = 0:2))
## $emmeans
## time = 0:
## Category emmean SE df lower.CL upper.CL
## NO_PEAK 0.18448 0.001316 19880 0.18190 0.18706
## PEAK 0.18204 0.001318 19880 0.17945 0.18462
##
## time = 1:
## Category emmean SE df lower.CL upper.CL
## NO_PEAK 0.18333 0.000655 19880 0.18205 0.18462
## PEAK 0.18496 0.000656 19880 0.18368 0.18625
##
## time = 2:
## Category emmean SE df lower.CL upper.CL
## NO_PEAK -0.00104 0.001319 19880 -0.00363 0.00154
## PEAK 0.03245 0.001316 19880 0.02987 0.03503
##
## Confidence level used: 0.95
##
## $contrasts
## time = 0:
## contrast estimate SE df t.ratio p.value
## NO_PEAK - PEAK 0.00244 0.001862 19880 1.313 0.1892
##
## time = 1:
## contrast estimate SE df t.ratio p.value
## NO_PEAK - PEAK -0.00163 0.000927 19880 -1.758 0.0788
##
## time = 2:
## contrast estimate SE df t.ratio p.value
## NO_PEAK - PEAK -0.03349 0.001863 19880 -17.978 <.0001
While emmeans allows to test differences at specific points on the time axis (or other continuous dimensions), we need a way to assess for the significance of a smooth contrast as a whole. For the purpose, we utilise another way to specify a smooth interacting with a fixed factor with mgcv::bam, the so-called binary smooths.
A binary smooth is a smooth that represents the difference between effects (including intercepts). We use them to model the difference between two levels of a factor, or between more complex combinations of factor level interactions (more on this later). Once we are able to introduce such a term in the model, we obtain two things:
summaryIn order to define a convenient binary smooth set up with mgcv, first we need to define numeric variables with the meaning of logical variables (yes you read it right), which represent the desired contrasts – a rather obscure convention adopted by the mgcv library. In our case we want to represent the difference between PEAK and NO_PEAK, with NO_PEAK as control level:
curves$IsPEAK <- (curves$Category == "PEAK") * 1
where * 1 is a quick way to transform the logical value inside () into a numeric one. The R formula and the corresponding maths form are:
| Equation | R formula |
|---|---|
| \(y_i = \beta_0 + f_0(t_i) + b_P(t_i) \cdot I(i \in P) + \epsilon_i\) | y ~ s(time) + s(time, by = IsPEAK) |
where:
m2.bin <- bam(y ~ s(time) + s(time, by = IsPEAK), data = curves)
summary(m2.bin)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time) + s(time, by = IsPEAK)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1796561 0.0002261 794.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 8.980 9 48195 <2e-16 ***
## s(time):IsPEAK 9.995 10 3227 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.975 Deviance explained = 97.5%
## fREML = -47132 Scale est. = 0.00050838 n = 19900
| Maths | R summary |
|---|---|
| \(\beta_0\) | (Intercept) |
| \(f_0(t_i)\) | s(time) |
| \(b_P(t_i)\) | s(time):IsPEAK |
Only now we are able to report about the significance of the difference between PEAK and NO_PEAK by reading off the summary output for s(time):IsPEAK.
How does the difference look?
get_modelterm(m2.bin, select = 2) %>%
ggplot() +
aes(x = time, y = fit) +
geom_line() +
geom_ribbon(mapping = aes(ymin = fit - se.fit, ymax = fit + se.fit), fill = "red", alpha = 0.5) +
theme_light() +
theme(text = element_text(size = 15))
## Summary:
## * time : numeric predictor; with 30 values ranging from 0.000000 to 2.000000.
## * IsPEAK : numeric predictor; set to the value(s): 1.
There is a third modality for smooths, the so-called ordered factor difference smooth. It allows to separate the intercept difference form the shape difference, i.e. it splits the \(b_P(t_i)\) term into a constant and a zero-centered smooth. See section 4.5.3 in Wieling’s tutorial for details.
Suppose we want to model the effect of trial, i.e. the shape of the curve y(time) does change not only according to a category (PEAK, NO_PEAK) but also along a numeric variable trial encoding at which point in the experiment we are. The data look like this:
This is a case of a two-dimensional, nonlinear interaction between two covariates, time (\(t\)) and trial \((r)\). The default way is to use a tensor.
| Equation | R formula |
|---|---|
| \(y_i = \beta_0 + \beta_P + f_{NP}(t_i, r_i) + f_P(t_i, r_i) + \epsilon_i\) | y ~ Category + te(time, trial, by = Category) |
Each of the tensor functions \(f\) is the generalisation of a complete interaction expression in lm:
| Maths | R formula in lm() |
|---|---|
| \(\beta_1 \cdot t_i + \beta_2 \cdot r_i + \beta_3 \cdot t_i \cdot r_i\) | time * trial, or time + trial + time:trial |
m3 <- bam(y ~ Category + te(time, trial, by = Category), data = curves)
summary(m3)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ Category + te(time, trial, by = Category)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1136007 0.0003081 368.74 <2e-16 ***
## CategoryPEAK 0.0241816 0.0004443 54.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(time,trial):CategoryNO_PEAK 9.001 9.002 29904 <2e-16 ***
## te(time,trial):CategoryPEAK 20.482 22.499 9117 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.96 Deviance explained = 96%
## fREML = -40957 Scale est. = 0.00094367 n = 19900
The surface plot for the PEAK and NO_PEAK conditions and slices along trial == 20 and trial == 80 (all summed effects) are:
If we were using lm instead of bam we would be able to produce similar surface visualisations, but in this case pretty bad estimates:
m3.lm <- lm(y ~ Category * time * trial, data = curves)
summary(m3.lm)
##
## Call:
## lm(formula = y ~ Category * time * trial, data = curves)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.248702 -0.068536 0.007367 0.074851 0.217019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.014e-01 3.358e-03 119.527 < 2e-16 ***
## CategoryPEAK -1.155e-02 5.189e-03 -2.226 0.026 *
## time -2.226e-01 2.908e-03 -76.546 < 2e-16 ***
## trial -1.320e-03 5.727e-05 -23.040 < 2e-16 ***
## CategoryPEAK:time 3.621e-02 4.488e-03 8.067 7.6e-16 ***
## CategoryPEAK:trial -1.431e-05 8.890e-05 -0.161 0.872
## time:trial 4.046e-06 4.958e-05 0.082 0.935
## CategoryPEAK:time:trial 1.260e-05 7.689e-05 0.164 0.870
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08886 on 19892 degrees of freedom
## Multiple R-squared: 0.6648, Adjusted R-squared: 0.6647
## F-statistic: 5637 on 7 and 19892 DF, p-value: < 2.2e-16
The surfaces themselves are non-linear, i.e. they are not flat, because of the product term \(\beta_3 \cdot t_i \cdot r_i\), but each slice along either time or trial is a straight line (this is called a ruled surface).
To test for significance differences between PEAK and NO_PEAK, we can use binary smooth surfaces as above:
curves$IsPEAK <- (curves$Category == "PEAK") * 1
m3.bin <- bam(y ~ te(time, trial) + te(time, trial, by = IsPEAK), data = curves)
summary(m3.bin)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ te(time, trial) + te(time, trial, by = IsPEAK)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1136094 0.0003081 368.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(time,trial) 9.0 9.001 29912.3 <2e-16 ***
## te(time,trial):IsPEAK 21.5 23.515 558.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.96 Deviance explained = 96%
## fREML = -40962 Scale est. = 0.00094366 n = 19900
pvisgam(m3.bin, view = c("time", "trial"), select = 2, color = c('white', 'blue'), main = "PEAK - NO_PEAK", print.summary = FALSE)
The difference PEAK - NO_PEAK is then significant, but the surface plot suggests that such difference does not change much along trial. To disentangle this, another form of tensor product is available, ti, which isolates the interaction:
| Equation | R formula |
|---|---|
| \(y_i = \beta_0 + \beta_P +\) | y ~ Category + |
| \(f_{1, NP}(t_i) + f_{1, N}(t_i) +\) | s(time, by = Category) + |
| \(f_{2, NP}(r_i) + f_{2, N}(r_i) +\) | s(trial, by = Category) + |
| \(f_{3, NP}(t_i, r_i) + f_{3, N}(t_i, r_i) + \epsilon_i\) | ti(time, trial, by = Category) |
m3.ti <- bam(y ~ Category + s(time, by = Category) + s(trial, by = Category) + ti(time, trial, by = Category), data = curves)
summary(m3.ti)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ Category + s(time, by = Category) + s(trial, by = Category) +
## ti(time, trial, by = Category)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1133385 0.0002257 502.21 <2e-16 ***
## CategoryPEAK 0.0242401 0.0003352 72.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time):CategoryNO_PEAK 8.987 9.000 49238.3 <2e-16 ***
## s(time):CategoryPEAK 8.998 9.000 36950.0 <2e-16 ***
## s(trial):CategoryNO_PEAK 1.000 1.000 32487.9 <2e-16 ***
## s(trial):CategoryPEAK 7.957 8.693 2667.3 <2e-16 ***
## ti(time,trial):CategoryNO_PEAK 5.236 6.161 3286.6 <2e-16 ***
## ti(time,trial):CategoryPEAK 14.388 15.545 950.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.979 Deviance explained = 97.9%
## fREML = -47090 Scale est. = 0.00050632 n = 19900
In this case the non-linear interaction between time and trial appears to be significant (as there is little noise in the data), though its effect is minimal, judging from the contour levels.
Another simplification of the model apparent from eye inspections is be to make the smooth over trial a linear term and independent of Category, i.e. substitute s(trial, by = Category) with just trial, or in maths notation \(\beta_2 \cdot r_i\). The fact that the term is linear is suggsted by its shape and by the edf values of the respective smooths.
Just like we moved from lm() to lmer(), we can build mixed-effects GAMs, or GAMMs. Random effects have the same meaning with GAMMs as with LMERs. The difference is that we can model variations in shape, i.e. random smooths.
The mgcv library allows for three different sorts of random terms, which in practice allow for three degrees of complexity:
k)This is best illustrated in Fig. 6 at p. 10 of this paper by van Rij et al..
Suppose we have curves along the time axis, the fixed factor Category as before, and subject (subjId) as random factor. With mgcv, random intercepts are indicated as s(subjId, bs = "re"). This means that subject-specific effects only vary the global height of the Category-specific curves, i.e. they can shift those curves up or down but do not change their shape. This is the equivalent of (1 | subjId) in lmer notation.
Random slopes… it depends. These are a source of great confusion. If the slope is with respect to a covariate, say time, then slope means slope in the common sense, i.e. a tilted line. But just like with lmer, random slopes can be also specified with respect to a fixed factor, say Category, which will behave in the same manneer as with lmer, i.e. the ‘slope’ is the difference between the intercept and the specific level, e.g. PEAK - NO_PEAK.
A random slope with respect to time would be: s(subjId, t, bs = "re"), while a random slope with respect to a fixed factor like Category would be: s(subjId, Category, bs = "re"). The first one is the one represented in the figure from the paper by van Rij. A random slope is equivalent to (0 + t | subjId) in lmer notation. Note the explicit absence of intercept. This is because there is no way to estimate correlations between intercept and slope, as in lmer. If we want to model both random intercept and slope, we need to include both terms in the model, i.e. s(subjId, bs = "re") + s(subjId, t, bs = "re"), and yet the GAMM will not estimate the correlation between the two.
Finally, a random smooth is indicated as: s(t, subjId, bs = "fs"). Note that the first term indicates the covariate, the second the random factor.
Each of the three setting can be further complexified by indicating a by factor, which behaves according to the conventions used in mgcv, i.e. for each level of the by factor an independent term is added. For example, a random intercept can be split by Category with: s(subjId, by = Category, bs = "re"), which adds two random intercepts, one for PEAK, one for NO_PEAK. This achieves the same effect as s(subjId, Category, bs = "re").
A random slope could be: s(subjId, t, by = Category, bs = "re"), where the slope is with respect to t, while Category simply splits the term in two as before.
In the example below we use a random smooth split with respect to Category: s(t, subjId, by = Category, bs = "fs").
A variation of the previous data set is depicted as follows:
The data set contains several y curves for each subject (a subset of subjects and trials in the plot above). Note that:
CategoryCategory levels within the same subjectPEAK condition, though not obviously related to the one of the first peakWe model all this by introducing a random smooth terms. Such terms vary along time, are speaker-dependent, i.e. are added to the fixed effects for each speaker, and come in two versions, one per Category level.
The model in mgcv is expressed as follows (we are not modelling the effect of trial):
m4 <- bam(y ~ Category + s(t, by = Category) +
s(t, subjId, by = Category, bs = "fs", m = 1), data = curves)
summary(m4)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ Category + s(t, by = Category) + s(t, subjId, by = Category,
## bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.509038 0.008256 61.653 < 2e-16 ***
## CategoryPEAK 0.072021 0.012002 6.001 1.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):CategoryNO_PEAK 8.926 8.936 106.89 <2e-16 ***
## s(t):CategoryPEAK 8.932 8.940 163.18 <2e-16 ***
## s(t,subjId):CategoryNO_PEAK 344.434 359.000 64.89 <2e-16 ***
## s(t,subjId):CategoryPEAK 346.545 359.000 68.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.88 Deviance explained = 88.2%
## fREML = 11644 Scale est. = 0.10527 n = 33600
The m = 1 argument is usually added to random smooths. The reason is to increase the non-linearity penalty, i.e. a stronger regularisation.
Let’s compere it with a version without the random terms.
m4.fixed <- bam(y ~ Category + s(t, by = Category), data = curves)
summary(m4.fixed)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ Category + s(t, by = Category)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.507969 0.003900 130.2 <2e-16 ***
## CategoryPEAK 0.072263 0.005516 13.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):CategoryNO_PEAK 8.991 9 4718 <2e-16 ***
## s(t):CategoryPEAK 8.992 9 4342 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.709 Deviance explained = 70.9%
## fREML = 24830 Scale est. = 0.25542 n = 33600
Note how R-sq and deviance explained are way higher for the model with random terms. In this case we can also directly affirm that the random terms are justified, as these are significant and they are the only difference between the two models.
The summed fixed effects look as expected:
plot_smooth(m4, view = "t", plot_all = "Category", col = Category.colors, print.summary = FALSE)
It is interesting to compare the result with the model without random terms:
plot_smooth(m4.fixed, view = "t", plot_all = "Category", col = Category.colors, print.summary = FALSE)
Note how ‘overconfident’ the simpler model is!!
Let’s visualise the shape of the random terms, picking out the first 5 subjects, to be compared with the raw data above:
inspect_random(m4, select = 3, cond=list(subjId = 1:5 %>% factor(), Category = "NO_PEAK"), main = "NO_PEAK", col = colorblind_pal()(5), lty = 1, lwd = 2, print.summary = FALSE)
inspect_random(m4, select = 4, cond=list(subjId = 1:5 %>% factor(), Category = "PEAK"), main = "PEAK", col = colorblind_pal()(5), lty = 1, lwd = 2, print.summary = FALSE)
Question: Do you see find by eye inspection the subject-specific consistency in the change of shape of the first peak across Category levels in the plot above (comparing with the raw data further above)?
Question: Does the GAMM model such consistency explicitly?